GreenStat - Peter van Horssen
'02 maart, 2018'
Peter van Horssen www.greenstat.nl
background in physical geography & ecology
experience in data analysis/statistics/mapping in ecological studies
analysis : statistics, GIS, graphs, maps
Intro
Basics:
do stuff yourself
…. coffee break ….
do more stuff yourself
Why GIS in R ?
or maybe start with : why R?
R does not make your data-analysis problem easier/simple !
creation of reproducible workflow
scripting and documentation of workflow
Recently a boost in the available 'spatial' tools (based on 'simple features') All spatial analysis functionality (and more!) is available in R, plotting and layout for maps is at a basic level.
this presentation provides 'pointers' only
what is geo-data?
'simple features' is a data model with basic features for all spatial data:
This standard is 'under the hood' used in nearly all GIS packages (QGis, Esri, PostGIS, ..)
it also exports all OGC operations:
st_area, st_buffer, st_length, st_transform,….
r&d for small to midsize (in terms of data) analysis
very usefull for data-exploration (spatial, temporal, attributes)
keep the workflow on one platform
analysis in R, fancy (web)mapping somewhere else
Assume: User with basic knowledge of R
data.frame, x[] , str()
User with workable knowledge of spatial analysis and map projections
Software: R-core
packages :
Please download this before the meetup, R lives @ cran : https://cran.r-project.org/bin/windows/base/
Package are installed when R is running, choose 'Package|Install Packages' in the topbar, choose a cloudsource and select package name
sf : package to handle spatial vector data efficiently (https://r-spatial.github.io/sf/) tidyverse: alle you need data exploration tools (https://www.tidyverse.org/packages/)
test set simple example: points with coordinates
head(df)
ID var2 var1 dates X Y
1 1 0.3474181 D 2016-05-30 4.473945 52.03298
2 2 0.1076117 D 2016-08-20 5.034427 52.78106
3 3 0.4108196 D 2016-07-30 7.115377 52.75334
4 4 0.2413287 B 2016-06-25 5.033752 53.19410
5 5 0.8936213 A 2016-03-26 4.207123 52.05226
6 6 0.2559923 D 2016-01-13 6.602886 51.06076
# script voor test df
n=10^3
df <- data.frame(
ID=c(1:n),
var2=runif(n),
var1=sample(LETTERS[1:4], n, replace=TRUE),
dates=sample(seq(as.Date('2016/01/01'), as.Date('2017/01/01'), by="day"), n, replace=TRUE),
X = runif(n, min= 3.36,max= 7.23), # why this min/max?
Y = runif(n, min=50.72,max=53.55)
)
library(sf)
df.sf <- st_as_sf(df,coords=c('X','Y'))
df.sf
Simple feature collection with 1000 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 3.36023 ymin: 50.72252 xmax: 7.221368 ymax: 53.54322
epsg (SRID): NA
proj4string: NA
First 10 features:
ID var2 var1 dates geometry
1 1 0.1911551 B 2016-03-19 POINT (5.944756 50.74802)
2 2 0.4594477 B 2016-06-02 POINT (5.355002 51.36075)
3 3 0.6065719 A 2016-08-22 POINT (4.910865 52.96492)
4 4 0.5212724 C 2016-06-21 POINT (6.172521 52.09474)
5 5 0.0396683 A 2016-06-16 POINT (5.321477 52.37966)
6 6 0.2919543 C 2016-03-13 POINT (4.474442 52.23473)
7 7 0.3235874 C 2016-11-20 POINT (7.099024 53.25873)
8 8 0.9087451 A 2016-03-17 POINT (5.424322 51.55503)
9 9 0.4612557 A 2016-08-02 POINT (4.036561 50.73775)
10 10 0.7023028 A 2016-06-14 POINT (4.471334 52.10025)
#st_as_sf(df,coords=c('X','Y','var2'), dim="XYZ")
#head(df.sf,n=2)
#head(df,n=2)
mapprojection - metadata
map projections in PROJ.4 and import/export through GDAL
for Netherlands:
in sf map projections through CRS (Coordinate Reference System) at http://spatialreference.org/
df.sf <- st_as_sf(df,coords=c('X','Y'), crs=4326)
df.sf
Simple feature collection with 1000 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 3.36023 ymin: 50.72252 xmax: 7.221368 ymax: 53.54322
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
First 10 features:
ID var2 var1 dates geometry
1 1 0.1911551 B 2016-03-19 POINT (5.944756 50.74802)
2 2 0.4594477 B 2016-06-02 POINT (5.355002 51.36075)
3 3 0.6065719 A 2016-08-22 POINT (4.910865 52.96492)
4 4 0.5212724 C 2016-06-21 POINT (6.172521 52.09474)
5 5 0.0396683 A 2016-06-16 POINT (5.321477 52.37966)
6 6 0.2919543 C 2016-03-13 POINT (4.474442 52.23473)
7 7 0.3235874 C 2016-11-20 POINT (7.099024 53.25873)
8 8 0.9087451 A 2016-03-17 POINT (5.424322 51.55503)
9 9 0.4612557 A 2016-08-02 POINT (4.036561 50.73775)
10 10 0.7023028 A 2016-06-14 POINT (4.471334 52.10025)
str(df.sf) # str : shows structure of object
Classes 'sf' and 'data.frame': 1000 obs. of 5 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ var2 : num 0.1912 0.4594 0.6066 0.5213 0.0397 ...
$ var1 : Factor w/ 4 levels "A","B","C","D": 2 2 1 3 1 3 3 1 1 1 ...
$ dates : Date, format: "2016-03-19" "2016-06-02" ...
$ geometry:sfc_POINT of length 1000; first list element: Classes 'XY', 'POINT', 'sfg' num [1:2] 5.94 50.75
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA
..- attr(*, "names")= chr "ID" "var2" "var1" "dates"
str(df)
'data.frame': 1000 obs. of 6 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ var2 : num 0.1912 0.4594 0.6066 0.5213 0.0397 ...
$ var1 : Factor w/ 4 levels "A","B","C","D": 2 2 1 3 1 3 3 1 1 1 ...
$ dates: Date, format: "2016-03-19" "2016-06-02" ...
$ X : num 5.94 5.36 4.91 6.17 5.32 ...
$ Y : num 50.7 51.4 53 52.1 52.4 ...
objects keep 'dataframe' class
df.sf[1:3,] # first three rows of dataframe
Simple feature collection with 3 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 4.910865 ymin: 50.74802 xmax: 5.944756 ymax: 52.96492
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
ID var2 var1 dates geometry
1 1 0.1911551 B 2016-03-19 POINT (5.944756 50.74802)
2 2 0.4594477 B 2016-06-02 POINT (5.355002 51.36075)
3 3 0.6065719 A 2016-08-22 POINT (4.910865 52.96492)
#df.sf[,3]
df.sf[1:3,2:3] # first three rows and column 2 and 3 only
Simple feature collection with 3 features and 2 fields
geometry type: POINT
dimension: XY
bbox: xmin: 4.910865 ymin: 50.74802 xmax: 5.944756 ymax: 52.96492
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
var2 var1 geometry
1 0.1911551 B POINT (5.944756 50.74802)
2 0.4594477 B POINT (5.355002 51.36075)
3 0.6065719 A POINT (4.910865 52.96492)
library(mapview) # R wrapper for leaflet .....
#mapview(df.sf)
mapview(df.sf,zcol="var1")
#library(tmap)
#tmap_mode("view")
#tm_shape(df.sf) + tm_dots(col="black", size=.1)
do stuff yourself ….
…coffee break…
maps downloaded from web CBS : buurt_2017.zip
list.files("../dataUtrecht/buurt_2017")
[1] "buurt_2017.cpg" "buurt_2017.dbf" "buurt_2017.prj"
[4] "buurt_2017.shp" "buurt_2017.shp.xml" "buurt_2017.shx"
[7] "gem_2017.cpg" "gem_2017.dbf" "gem_2017.prj"
[10] "gem_2017.shp" "gem_2017.shp.xml" "gem_2017.shx"
[13] "wijk_2017.cpg" "wijk_2017.dbf" "wijk_2017.prj"
[16] "wijk_2017.shp" "wijk_2017.shp.xml" "wijk_2017.shx"
Utrecht : bomenkaart.zip
list.files("../dataUtrecht/bomenkaart")
[1] "Bomen_GISIB_ArcGISonline.dbf" "Bomen_GISIB_ArcGISonline.prj"
[3] "Bomen_GISIB_ArcGISonline.sbn" "Bomen_GISIB_ArcGISonline.sbx"
[5] "Bomen_GISIB_ArcGISonline.shp" "Bomen_GISIB_ArcGISonline.shx"
library(sf)
library(tidyverse)
buurt.sf <- st_read("../dataUtrecht/buurt_2017/buurt_2017.shp")
Reading layer `buurt_2017' from data source `L:\GreenStat\projecten\2017-X03 maptime 030 R GIS\dataUtrecht\buurt_2017\buurt_2017.shp' using driver `ESRI Shapefile'
Simple feature collection with 13308 features and 39 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 10425.16 ymin: 306846.2 xmax: 278026.1 ymax: 621876.3
epsg (SRID): NA
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
buurt.sf %>% head(n=3)
Simple feature collection with 3 features and 39 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 251260.5 ymin: 592402 xmax: 254809.1 ymax: 594573.5
epsg (SRID): NA
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
BU_CODE BU_NAAM WK_CODE GM_CODE GM_NAAM IND_WBI WATER
1 BU00030000 Appingedam-Centrum WK000300 GM0003 Appingedam 1 NEE
2 BU00030001 Appingedam-West WK000300 GM0003 Appingedam 1 NEE
3 BU00030002 Appingedam-Oost WK000300 GM0003 Appingedam 1 NEE
POSTCODE DEK_PERC OAD STED AANT_INW AANT_MAN AANT_VROUW P_00_14_JR
1 9901 1 1190 3 2335 1090 1245 10
2 9903 5 894 4 3080 1535 1545 17
3 9902 1 1112 3 5955 2865 3090 16
P_15_24_JR P_25_44_JR P_45_64_JR P_65_EO_JR P_ONGEHUWD P_GEHUWD
1 9 21 30 30 40 36
2 11 20 33 19 43 47
3 11 22 27 25 43 41
P_GESCHEID P_VERWEDUW BEV_DICHTH AANTAL_HH P_EENP_HH P_HH_Z_K P_HH_M_K
1 11 12 2774 1310 54 28 18
2 7 4 1950 1335 27 37 36
3 9 8 2094 2735 35 31 34
GEM_HH_GR P_WEST_AL P_N_W_AL P_MAROKKO P_ANT_ARU P_SURINAM P_TURKIJE
1 1.7 6 4 0 1 0 1
2 2.3 6 3 0 1 0 1
3 2.1 9 9 1 1 1 4
P_OVER_NW OPP_TOT OPP_LAND OPP_WATER geometry
1 2 90 84 5 MULTIPOLYGON (((253641.5 59...
2 1 163 158 5 MULTIPOLYGON (((251260.5 59...
3 3 295 284 11 MULTIPOLYGON (((254580.7 59...
library(sf)
library(tidyverse)
buurt.sf %>% str() # show structure of object
Classes 'sf' and 'data.frame': 13308 obs. of 40 variables:
$ BU_CODE : Factor w/ 13308 levels "BU00030000","BU00030001",..: 1 2 3 4 5 6 7 8 9 10 ...
$ BU_NAAM : Factor w/ 12251 levels "'n Oaln Diek",..: 372 374 373 9715 10782 10735 786 9585 12185 11064 ...
$ WK_CODE : Factor w/ 3159 levels "WK000300","WK000500",..: 1 1 1 1 1 1 2 2 2 2 ...
$ GM_CODE : Factor w/ 389 levels "GM0003","GM0005",..: 1 1 1 1 1 1 2 2 2 2 ...
$ GM_NAAM : Factor w/ 388 levels "'s-Gravenhage",..: 20 20 20 20 20 20 28 28 28 28 ...
$ IND_WBI : num 1 1 1 1 1 1 1 1 1 1 ...
$ WATER : Factor w/ 3 levels "B","JA","NEE": 3 3 3 3 3 3 3 3 3 3 ...
$ POSTCODE : Factor w/ 3907 levels "-99999999","1011",..: 3830 3832 3831 3832 3831 3830 3782 3782 3784 3784 ...
$ DEK_PERC : num 1 5 1 4 1 1 1 1 1 1 ...
$ OAD : num 1190 894 1112 351 74 ...
$ STED : num 3 4 3 5 5 5 4 5 5 5 ...
$ AANT_INW : num 2335 3080 5955 320 100 ...
$ AANT_MAN : num 1090 1535 2865 165 50 ...
$ AANT_VROUW: num 1245 1545 3090 150 50 ...
$ P_00_14_JR: num 10 17 16 21 17 24 16 21 14 17 ...
$ P_15_24_JR: num 9 11 11 11 8 11 12 12 11 10 ...
$ P_25_44_JR: num 21 20 22 23 14 15 21 27 17 16 ...
$ P_45_64_JR: num 30 33 27 35 45 33 29 26 38 39 ...
$ P_65_EO_JR: num 30 19 25 10 17 17 22 13 21 17 ...
$ P_ONGEHUWD: num 40 43 43 50 40 46 43 48 43 41 ...
$ P_GEHUWD : num 36 47 41 44 55 47 45 47 45 51 ...
$ P_GESCHEID: num 11 7 9 3 3 6 6 4 8 7 ...
$ P_VERWEDUW: num 12 4 8 2 2 1 6 1 4 2 ...
$ BEV_DICHTH: num 2774 1950 2094 59 18 ...
$ AANTAL_HH : num 1310 1335 2735 115 40 ...
$ P_EENP_HH : num 54 27 35 18 15 21 30 18 34 19 ...
$ P_HH_Z_K : num 28 37 31 32 48 37 34 38 37 46 ...
$ P_HH_M_K : num 18 36 34 50 38 42 36 45 29 35 ...
$ GEM_HH_GR : num 1.7 2.3 2.1 2.7 2.5 2.7 2.3 2.7 2.2 2.5 ...
$ P_WEST_AL : num 6 6 9 6 2 8 4 6 5 5 ...
$ P_N_W_AL : num 4 3 9 0 1 1 3 2 4 4 ...
$ P_MAROKKO : num 0e+00 0e+00 1e+00 -1e+08 -1e+08 ...
$ P_ANT_ARU : num 1e+00 1e+00 1e+00 -1e+08 -1e+08 ...
$ P_SURINAM : num 0e+00 0e+00 1e+00 -1e+08 -1e+08 ...
$ P_TURKIJE : num 1e+00 1e+00 4e+00 -1e+08 -1e+08 ...
$ P_OVER_NW : num 2e+00 1e+00 3e+00 -1e+08 -1e+08 ...
$ OPP_TOT : num 90 163 295 559 582 769 313 2190 74 699 ...
$ OPP_LAND : num 84 158 284 540 554 ...
$ OPP_WATER : num 5 5 11 18 28 13 5 14 3 6 ...
$ geometry :sfc_MULTIPOLYGON of length 13308; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:72, 1:2] 253642 253617 253599 253593 253602 ...
..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr "BU_CODE" "BU_NAAM" "WK_CODE" "GM_CODE" ...
library(sf)
library(tidyverse)
u.buurt.sf <-
buurt.sf %>% filter(GM_NAAM=='Utrecht') %>% select(GM_NAAM,BU_NAAM,AANT_INW)
u.buurt.sf %>% head(n=3)
Simple feature collection with 3 features and 3 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 133872 ymin: 454563.4 xmax: 135198.4 ymax: 456063.6
epsg (SRID): NA
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
GM_NAAM BU_NAAM AANT_INW geometry
1 Utrecht Welgelegen, Den Hommel 1395 MULTIPOLYGON (((135198.4 45...
2 Utrecht Oog in Al 4280 MULTIPOLYGON (((134877.6 45...
3 Utrecht Halve Maan-Zuid 1435 MULTIPOLYGON (((134161.3 45...
library(sf)
library(tidyverse)
library(mapview)
u.buurt.sf %>% mapview()
#u.buurt.sf %>% mapview(zcol="BU_NAAM")
u.buurt.sf %>% mapview(fill=NA)
library(sf)
library(tidyverse)
bomen.sf <- st_read("../dataUtrecht/bomenkaart/Bomen_GISIB_ArcGISonline.shp")
Reading layer `Bomen_GISIB_ArcGISonline' from data source `L:\GreenStat\projecten\2017-X03 maptime 030 R GIS\dataUtrecht\bomenkaart\Bomen_GISIB_ArcGISonline.shp' using driver `ESRI Shapefile'
Simple feature collection with 168736 features and 10 fields
geometry type: POINT
dimension: XY
bbox: xmin: 126735.1 ymin: 448930 xmax: 141827.3 ymax: 461162.1
epsg (SRID): NA
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
bomen.sf %>% head(n=3)
Simple feature collection with 3 features and 10 fields
geometry type: POINT
dimension: XY
bbox: xmin: 134185.4 ymin: 457536.2 xmax: 134346.6 ymax: 458042.9
epsg (SRID): NA
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
Naam_NL Naam_Wet Plantjaar Eigenaar
1 Kers Prunus 'Umineko' 2017 Gemeentelijk
2 Japanse sierkers Prunus serrulata 2017 Gemeentelijk
3 Schietwilg Salix alba NA Niet gemeentelijk
Buurt Wijk Boomnr Boomnr_Oud X_coordina Y_coordina
1 Elinkwijk Noordwest 2938159 <NA> 134185 457957
2 Edisonstraat e.o. Noordwest 2938160 <NA> 134292 458042
3 Schaepmanplein e.o. Noordwest 2938161 <NA> 134346 457536
geometry
1 POINT (134185.4 457957.2)
2 POINT (134292.8 458042.9)
3 POINT (134346.6 457536.2)
use of 'pipes' : '%>%'
select : select column
filter : filter rows
mutate : add column (and mutate value)
summarize : aggregate
library(tidyverse)
df <- df %>% mutate(maand=format.Date(dates, "%m"))
df %>% head()
ID var2 var1 dates X Y maand
1 1 0.1911551 B 2016-03-19 5.944756 50.74802 03
2 2 0.4594477 B 2016-06-02 5.355002 51.36075 06
3 3 0.6065719 A 2016-08-22 4.910865 52.96492 08
4 4 0.5212724 C 2016-06-21 6.172521 52.09474 06
5 5 0.0396683 A 2016-06-16 5.321477 52.37966 06
6 6 0.2919543 C 2016-03-13 4.474442 52.23473 03
df %>% filter(var1=="B" ) %>% head()
ID var2 var1 dates X Y maand
1 1 0.1911551 B 2016-03-19 5.944756 50.74802 03
2 2 0.4594477 B 2016-06-02 5.355002 51.36075 06
3 12 0.1905957 B 2016-07-11 6.935469 51.74871 07
4 14 0.3848420 B 2016-01-02 5.704743 52.39669 01
5 15 0.7399423 B 2016-10-04 6.705704 51.05816 10
6 18 0.3100369 B 2016-08-01 5.364002 50.79384 08
df %>%
ggplot(aes(dates,ID)) +
geom_point() +
theme(text = element_text(size = 25))
df %>%
ggplot(aes(maand,var2, group=maand)) +
geom_boxplot()+
theme(text = element_text(size = 25))
df %>% ggplot(aes(var1,var2)) +
geom_boxplot() +
facet_wrap(~maand) + # conditional plot
theme(text = element_text(size = 25))
df %>% ggplot(aes(maand,var2)) +
geom_boxplot() +
facet_wrap(~var1)+
theme(text = element_text(size = 25))
df.sf[1:3,]
Simple feature collection with 3 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 4.910865 ymin: 50.74802 xmax: 5.944756 ymax: 52.96492
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
ID var2 var1 dates geometry
1 1 0.1911551 B 2016-03-19 POINT (5.944756 50.74802)
2 2 0.4594477 B 2016-06-02 POINT (5.355002 51.36075)
3 3 0.6065719 A 2016-08-22 POINT (4.910865 52.96492)
# selection returns sf object ..
#df.sf[,3]
#df.sf[1:3,2:3]
mapview(nl01.sf) + df.sf[1:10,"var1"]
#
library(mapview)
library(sf)
library(tidyverse)
# select by spatial object .....!
df.sf[nl01.sf,]
mapview(nl01.sf) + df.sf[nl01.sf,]
#what if ?
# nl01.sf[df.sf,]
#nl01.sf[nl01.sf$NAME_1 =="Noord-Brabant",]
nbr.sf <-
nl01.sf %>% filter(NAME_1 =="Noord-Brabant")
df.sf[nbr.sf,]
mapview( nbr.sf)+ df.sf[nbr.sf,]
# neat !